[SPARK-48105] Fix the data corruption issue when state store unload and snapshotting happens concurrently for HDFS state store - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 4.0.0, 3.5.2, 3.4.4
Fix Version/s: 4.0.0, 3.5.2, 3.4.4
Component/s: Structured Streaming
Labels:
- correctness
- pull-request-available

Description

There are two race conditions between state store snapshotting and state store unloading which could result in query failure and potential data corruption.

Case 1:

the maintenance thread pool encounters some issues and call the stopMaintenanceTask, this function further calls threadPool.stop. However, this function doesn't wait for the stop operation to be completed and move to do the state store unload and clear.
the provider unload will close the state store which clear the values of loadedMaps for HDFS backed state store.
if the not-yet-stop maintenance thread is still running and trying to do the snapshot, but the data in the underlying `HDFSBackedStateStoreMap` has been removed. if this snapshot process completes successfully, then we will write corrupted data and the following batches will consume this corrupted data.

Case 2:

In executor_1, the maintenance thread is going to do the snapshot for state_store_1, it retrieves the `HDFSBackedStateStoreMap` object from the loadedMaps, after this, the maintenance thread releases the lock of the loadedMaps.
state_store_1 is loaded in another executor, e.g. executor_2.
another state store, state_store_2, is loaded on executor_1 and reportActiveStoreInstance to driver.
executor_1 does the unload for those no longer active state store which clears the data entries in the `HDFSBackedStateStoreMap`
the snapshotting thread is terminated and uploads the incomplete snapshot to cloud because the iterator doesn't have next element after doing the clear.
future batches are consuming the corrupted data.

Proposed fix:

When we close the hdfs state store, we should only remove the entry from `loadedMaps` rather than doing the active data cleanup. JVM GC should be able to help us GC those objects.
we should wait for the maintenance thread to stop before unloading the providers.

Thanks anishshri-db for helping debug this issue!

Attachments

Issue Links

is related to

SPARK-31754 Spark Structured Streaming: NullPointerException in Stream Stream join

Open

links to

GitHub Pull Request #46351

GitHub Pull Request #46415

Activity

People

Assignee:: Huanli Wang

Reporter:: Huanli Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/May/24 03:12

Updated:: 23/May/24 04:10

Resolved:: 07/May/24 00:53