[SPARK-25399] Reusing execution threads from continuous processing for microbatch streaming can result in correctness issues - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: Structured Streaming
Labels:
- correctness

Description

Continuous processing sets some thread local variables that, when read by a thread running a microbatch stream, may result in incorrect or no previous state being read and resulting in wrong answers. This was caught by a job running the StreamSuite tests, and only repros occasionally when the same threads are used.

The issue is in StateStoreRDD.compute - when we compute currentVersion, we read from a thread local variable which is set by continuous processing threads. If this value is set, we then think we're on the wrong state version.

I imagine very few people, if any, would run into this bug, because you'd have to use continuous processing and then microbatch processing in the same cluster. However, it can result in silent correctness issues, and it would be very difficult for someone to tell if they were impacted by this or not.

Attachments

Issue Links

links to

[Github] Pull Request #22386 (mukulmurthy)

Activity

People

Assignee:: Mukul Murthy

Reporter:: Mukul Murthy

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Sep/18 21:21

Updated:: 21/Sep/18 07:14

Resolved:: 11/Sep/18 22:53