Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.0.0, Impala 4.1.0
-
None
-
ghx-label-14
Description
HDFS provides a RemoteIterator for listing files. The two key interfaces are hasNext() and next(). We only handle the FileNotFoundException thrown in next(). However, hasNext() could also thrown FileNotFoundException for staging files:
- The HDFS client will do a partial listing when creating the RemoteIterator. By default it only list 1000 items under the dir. The limit is configured by "dfs.ls.limit".
- Usually the hasNext() call just consume the partial list and it's enough. However, if the dir contains more than 1000 items. The 1001-th hasNext() call will do a continue listing. This could hit FileNotFoundException if the dir was removed.
Such a FileNotFoundException thrown from hasNext() will break the metadata loading:
https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L948
Reproducing
To stably reproduce the issue, I use IntelliJ to stop catalogd at some point, and then manually delete the dir. Here are the detailed steps:
Create a table and run a simple query on it to let catalogd cache its metadata.
impala-shell> create table my_alltypes like functional.alltypes; impala-shell> desc my_alltypes;
Manually create staging dir with more than 1000 files in it.
hdfs dfs -mkdir -p hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 bash hdfs-touch-1002-files.sh
In IntelliJ, set a breakpoint at this line and attach to a running catalogd:
https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L987
In Hive, trigger an INSERT on this table.
$ beeline -u "jdbc:hive2://localhost:11050" beeline> insert into my_alltypes partition(year=2009, month=4) select id,bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col from functional.alltypes where year=2009 and month=4;
IntelliJ will prompt out for debugging. Step over until the partial listing is done on the staging dir "-ext-10001", as the screenshot shows.
Manually remove the staging dir "-ext-10001"
hdfs dfs -rm -r -skipTrash hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 Deleted hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
Then stop debugging to resume the process. I can see the following errors in catalogd. Event processor is broken and requires a global INVALIDATE METADATA to recover.
I0730 10:36:27.031431 9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/996 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1 I0730 10:36:27.031471 9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/997 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1 E0730 10:36:27.034335 9452 ParallelFileMetadataLoader.java:171] Loading file and block metadata for 1 paths for table default.my_alltypes encountered an error loading data for path hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4 Java exception follows: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist. at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513) at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743) at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904) at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465) at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068) at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:1317) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:1288) at org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:948) at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:888) at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202) at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:162) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:162) ... 17 more E0730 10:36:27.034795 9452 MetastoreEventsProcessor.java:841] Event processing needs a invalidate command to resolve the state Java exception follows: org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: EventId: 728392 EventType: ADD_PARTITION Failed to refresh newly added partitions of table {}. Event processing cannot continue. Issue an invalidate metadata command to reset event processor. at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1921) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.impala.catalog.TableLoadingException: Loading file and block metadata for 1 paths for table default.my_alltypes: failed to load 1 paths. Check the catalog server log for more details. at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:185) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743) at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904) at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465) at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068) at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907) ... 10 more I0730 10:36:27.622951 9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1697 Last sent catalog version: 1696 I0730 10:36:27.625458 9536 catalog-server.cc:813] Collected update: 1:CATALOG_SERVICE_ID, version=1698, original size=60, compressed size=58 W0730 10:36:28.035472 9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391 W0730 10:36:29.035904 9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391 I0730 10:36:29.623289 9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1698 Last sent catalog version: 1697 W0730 10:36:30.036418 9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391 W0730 10:36:31.036957 9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
Attachments
Attachments
Issue Links
- relates to
-
IMPALA-8663 FileMetadataLoader should skip listing files in hidden and tmp directories
- Resolved
-
IMPALA-9122 Ignore FileNotFoundException when loading a table
- Resolved
-
IMPALA-10579 Deadloop in table metadata loading when using an invalid RemoteIterator
- Resolved
- links to