Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.6.0, 0.5.2
Description
`PartitionFilesSorter` sorts shuffle files in `worker-file-sorter-executor` thread and cleans up expired key in `worker-expired-shuffle-cleaner` thread. There is a case that after `worker-expired-shuffle-cleaner` cleaning up expired shuffle key, `worker-file-sorter-executor` updates sorted shuffle files, which causes NullPointerException at present.
2024-10-23 17:26:17,162 [INFO] [worker-expired-shuffle-cleaner] - org.apache.celeborn.service.deploy.worker.Worker -Logging.scala(51) -Cleaned up expired shuffle application_1724141892576_3843182_1-0 2024-10-23 17:26:17,392 [ERROR] [worker-file-sorter-executor-237572] - org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter -PartitionFilesSorter.java(752) -Sorting shuffle file for application_1724141892576_3843182_1-0-1875-0-0 /mnt/storage02/celeborn-worker/shuffle_data/application_1724141892576_3843182_1/0/1875-0-0 failed, detail: java.lang.NullPointerException: null at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.updateSortedShuffleFiles(PartitionFilesSorter.java:455) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT] at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter$FileSorter.sort(PartitionFilesSorter.java:747) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT] at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.lambda$new$1(PartitionFilesSorter.java:164) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_162] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_162] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_162]
Attachments
Issue Links
- links to