Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11464

hasNext() throws FileNotFoundException on staging files and breaks file metadata loading

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0, Impala 4.1.0
    • Impala 4.2.0, Impala 4.1.1
    • Catalog
    • None
    • ghx-label-14

    Description

      HDFS provides a RemoteIterator for listing files. The two key interfaces are hasNext() and next(). We only handle the FileNotFoundException thrown in next(). However, hasNext() could also thrown FileNotFoundException for staging files:

      • The HDFS client will do a partial listing when creating the RemoteIterator. By default it only list 1000 items under the dir. The limit is configured by "dfs.ls.limit".
      • Usually the hasNext() call just consume the partial list and it's enough. However, if the dir contains more than 1000 items. The 1001-th hasNext() call will do a continue listing. This could hit FileNotFoundException if the dir was removed.

      Such a FileNotFoundException thrown from hasNext() will break the metadata loading:
      https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L948

      Reproducing

      To stably reproduce the issue, I use IntelliJ to stop catalogd at some point, and then manually delete the dir. Here are the detailed steps:

      Create a table and run a simple query on it to let catalogd cache its metadata.

      impala-shell> create table my_alltypes like functional.alltypes;
      impala-shell> desc my_alltypes;
      

      Manually create staging dir with more than 1000 files in it.

      hdfs dfs -mkdir -p hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
      bash hdfs-touch-1002-files.sh
      

      In IntelliJ, set a breakpoint at this line and attach to a running catalogd:
      https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L987

      In Hive, trigger an INSERT on this table.

      $ beeline -u "jdbc:hive2://localhost:11050"
      beeline> insert into my_alltypes partition(year=2009, month=4) select id,bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col from functional.alltypes where year=2009 and month=4;
      

      IntelliJ will prompt out for debugging. Step over until the partial listing is done on the staging dir "-ext-10001", as the screenshot shows.

      Manually remove the staging dir "-ext-10001"

      hdfs dfs -rm -r -skipTrash hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
      Deleted hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
      

      Then stop debugging to resume the process. I can see the following errors in catalogd. Event processor is broken and requires a global INVALIDATE METADATA to recover.

      I0730 10:36:27.031431  9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/996 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
      I0730 10:36:27.031471  9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/997 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
      E0730 10:36:27.034335  9452 ParallelFileMetadataLoader.java:171] Loading file and block metadata for 1 paths for table default.my_alltypes encountered an error loading data for path hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4
      Java exception follows:
      java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist.
              at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
              at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
              at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
              at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
              at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
              at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
              at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
              at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
              at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
              at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
              at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
              at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
              at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
              at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist.
              at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:1317)
              at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:1288)
              at org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:948)
              at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:888)
              at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
              at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:162)
              at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
              at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
              at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
              at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
              at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
              at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
              at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
              at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:162)
              ... 17 more
      E0730 10:36:27.034795  9452 MetastoreEventsProcessor.java:841] Event processing needs a invalidate command to resolve the state
      Java exception follows:
      org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: EventId: 728392 EventType: ADD_PARTITION Failed to refresh newly added partitions of table {}. Event processing cannot continue. Issue an invalidate metadata command to reset event processor.
              at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1921)
              at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
              at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
              at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.impala.catalog.TableLoadingException: Loading file and block metadata for 1 paths for table default.my_alltypes: failed to load 1 paths. Check the catalog server log for more details.
              at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:185)
              at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
              at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
              at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
              at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
              at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
              at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
              at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
              ... 10 more
      I0730 10:36:27.622951  9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1697 Last sent catalog version: 1696
      I0730 10:36:27.625458  9536 catalog-server.cc:813] Collected update: 1:CATALOG_SERVICE_ID, version=1698, original size=60, compressed size=58
      W0730 10:36:28.035472  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
      W0730 10:36:29.035904  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
      I0730 10:36:29.623289  9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1698 Last sent catalog version: 1697
      W0730 10:36:30.036418  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
      W0730 10:36:31.036957  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
      

      Attachments

        1. Selection_034.png
          505 kB
          Quanlong Huang
        2. hdfs-touch-1002-files.sh
          145 kB
          Quanlong Huang

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: