Uploaded image for project: 'Apache Apex Malhar'
  1. Apache Apex Malhar
  2. APEXMALHAR-2223

Managed state should parallelize WAL writes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • None
    • None

    Description

      Currently, data is accumulated in memory and written to the WAL on checkpoint only. This causes a write spike on checkpoint and does not utilize the HDFS write pipeline. The other extreme is writing to the WAL as soon as data arrives and then only flush in beforeCheckpoint. The downside of this is that when the same key is written many times, all duplicates will be in the WAL. Need to find a balances approach, that the user can potentially fine tune.

      Attachments

        Activity

          People

            csingh Chandni Singh
            thw Thomas Weise
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: