[APEXMALHAR-2223] Managed state should parallelize WAL writes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.4.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently, data is accumulated in memory and written to the WAL on checkpoint only. This causes a write spike on checkpoint and does not utilize the HDFS write pipeline. The other extreme is writing to the WAL as soon as data arrives and then only flush in beforeCheckpoint. The downside of this is that when the same key is written many times, all duplicates will be in the WAL. Need to find a balances approach, that the user can potentially fine tune.

Attachments

Issue Links

links to

GitHub Pull Request #438

Activity

People

Assignee:: Chandni Singh

Reporter:: Thomas Weise

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Sep/16 00:11

Updated:: 02/Oct/17 16:13