Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 4.4.0
-
None
-
ghx-label-14
Description
On larger clusters the Iceberg metadata scanner can be scheduled to executors, for example during a join. The fragment in this case will fail a precondition check, because either the frontend_ object will not be present or the table. Setting exec_at_coord to true is not enough and these fragments should be scheduled to the coord_only_executor_group.
Additionally, setting NUM_NODES=1 should be a viable workaround.
Reproducible with the following local dev Impala cluster:
./bin/start-impala-cluster.py --cluster_size=3 --num_coordinators=1 --use_exclusive_coordinators
and query:
select count(b.parent_id) from functional_parquet.iceberg_query_metadata.history a
join functional_parquet.iceberg_query_metadata.history b on a.snapshot_id = b.snapshot_id;
Attachments
Issue Links
- relates to
-
IMPALA-10947 SQL support for querying Iceberg metadata
- In Progress
Commit 9071030f7fc272520c26ddb793551987226a5693 in impala's branch refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9071030f7 ]
IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinatorOn clusters with dedicated coordinators and executors the Iceberg
metadata scanner fragment(s) can be scheduled to executors, for example
during a join. The fragment in this case will fail a precondition check,
because either the 'frontend_' object or the table will not be present.
This change forces Iceberg metadata scanner fragments to be scheduled on
the coordinator. It is not enough to set the DataPartition type to
UNPARTITIONED, because unpartitioned fragments can still be scheduled on
executors. This change introduces a new flag in the TPlanFragment thrift
struct - if it is true, the fragment is always scheduled on the
coordinator.
Testing:
joined together.
Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
Reviewed-on: http://gerrit.cloudera.org:8080/21138
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>