[SPARK-42565] Error log improve ment for the lock acquisition of RocksDB state store instance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 3.5.0
Component/s: Structured Streaming
Labels:
None

Description

"23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
"23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.

We are seeing those error messages for a testing query. The taskId != partitionId but we fail to be clear on this in the error log.

It's confusing when we see those logs: the second log entry seems to talk about `task 3.0` (it's actually partition 3 and retry attempt 0), but the `TID 363` is already occupied by `task 2.0 in stage 57.1`.

Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)

Attachments

Issue Links

links to

[Github] Pull Request #40161 (huanliwang-db)

Activity

People

Assignee:: Huanli Wang

Reporter:: Huanli Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Feb/23 18:47

Updated:: 24/Feb/23 22:54

Resolved:: 24/Feb/23 22:54