Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6766

Update documentation with async backends and incremental checkpoints

    Details

      Description

      This PR introduces some documentation about async heap backends and incremental snapshots with RocksDB.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user StefanRRichter opened a pull request:

          https://github.com/apache/flink/pull/4011

          FLINK-6766 Update documentation about async backends and incrementa…

          …l checkpoints

          This PR introduces some documentation about async heap backends and incremental snapshots with RocksDB.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/StefanRRichter/flink backend-docu-update

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/4011.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #4011


          commit 9f26592529eec20a164e5bb8d08cc85a074c0c3b
          Author: Stefan Richter <s.richter@data-artisans.com>
          Date: 2017-05-29T15:18:19Z

          FLINK-6766 Update documentation about async backends and incremental checkpoints


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user StefanRRichter opened a pull request: https://github.com/apache/flink/pull/4011 FLINK-6766 Update documentation about async backends and incrementa… …l checkpoints This PR introduces some documentation about async heap backends and incremental snapshots with RocksDB. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StefanRRichter/flink backend-docu-update Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4011 commit 9f26592529eec20a164e5bb8d08cc85a074c0c3b Author: Stefan Richter <s.richter@data-artisans.com> Date: 2017-05-29T15:18:19Z FLINK-6766 Update documentation about async backends and incremental checkpoints
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/4011

          CC @rmetzger

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/4011 CC @rmetzger
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4011#discussion_r119344563

          — Diff: docs/ops/state_backends.md —
          @@ -88,6 +102,13 @@ that is (per default) stored in the TaskManager data directories. Upon checkpoin
          RocksDB data base will be checkpointed into the configured file system and directory. Minimal
          metadata is stored in the JobManager's memory (or, in high-availability mode, in the metadata checkpoint).

          +The RocksDBStateBackend always performs asynchronous snapshots.
          +
          +Limitations of the RocksDBStateBackend:
          +
          + - As RocksDB's JNI bridge API is based on byte[], the maximum supported size per key and per value is 2^31 bytes each.
          + IMPORTANT: states that use merge operations in RocksDB (e.g. ListState) can silently accumulate value sizes > 2^31 bytes and will then fail on their next retrival. This is currently a limitation of RocksDB JNI.
          +
          — End diff –

          retrieval

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4011#discussion_r119344563 — Diff: docs/ops/state_backends.md — @@ -88,6 +102,13 @@ that is (per default) stored in the TaskManager data directories. Upon checkpoin RocksDB data base will be checkpointed into the configured file system and directory. Minimal metadata is stored in the JobManager's memory (or, in high-availability mode, in the metadata checkpoint). +The RocksDBStateBackend always performs asynchronous snapshots. + +Limitations of the RocksDBStateBackend: + + - As RocksDB's JNI bridge API is based on byte[], the maximum supported size per key and per value is 2^31 bytes each. + IMPORTANT: states that use merge operations in RocksDB (e.g. ListState) can silently accumulate value sizes > 2^31 bytes and will then fail on their next retrival. This is currently a limitation of RocksDB JNI. + — End diff – retrieval
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4011#discussion_r119344615

          — Diff: docs/ops/state_backends.md —
          @@ -88,6 +102,13 @@ that is (per default) stored in the TaskManager data directories. Upon checkpoin
          RocksDB data base will be checkpointed into the configured file system and directory. Minimal
          metadata is stored in the JobManager's memory (or, in high-availability mode, in the metadata checkpoint).

          +The RocksDBStateBackend always performs asynchronous snapshots.
          +
          +Limitations of the RocksDBStateBackend:
          +
          + - As RocksDB's JNI bridge API is based on byte[], the maximum supported size per key and per value is 2^31 bytes each.
          + IMPORTANT: states that use merge operations in RocksDB (e.g. ListState) can silently accumulate value sizes > 2^31 bytes and will then fail on their next retrival. This is currently a limitation of RocksDB JNI.
          — End diff –

          retrieval

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4011#discussion_r119344615 — Diff: docs/ops/state_backends.md — @@ -88,6 +102,13 @@ that is (per default) stored in the TaskManager data directories. Upon checkpoin RocksDB data base will be checkpointed into the configured file system and directory. Minimal metadata is stored in the JobManager's memory (or, in high-availability mode, in the metadata checkpoint). +The RocksDBStateBackend always performs asynchronous snapshots. + +Limitations of the RocksDBStateBackend: + + - As RocksDB's JNI bridge API is based on byte[], the maximum supported size per key and per value is 2^31 bytes each. + IMPORTANT: states that use merge operations in RocksDB (e.g. ListState) can silently accumulate value sizes > 2^31 bytes and will then fail on their next retrival. This is currently a limitation of RocksDB JNI. — End diff – retrieval
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4011#discussion_r119344251

          — Diff: docs/monitoring/large_state_tuning.md —
          @@ -138,6 +138,22 @@ Unfortunately, RocksDB's performance can vary with configuration, and there is l
          RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal
          on spinning disks.

          +*Incremental Checkpoints*
          +
          +Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer
          +recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of
          +producing a full, self-contained backups of the backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages
          — End diff –

          a full, self-contained backup of the state backend

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4011#discussion_r119344251 — Diff: docs/monitoring/large_state_tuning.md — @@ -138,6 +138,22 @@ Unfortunately, RocksDB's performance can vary with configuration, and there is l RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal on spinning disks. +* Incremental Checkpoints * + +Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer +recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of +producing a full, self-contained backups of the backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages — End diff – a full, self-contained backup of the state backend
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/4011

          Thanks for the input @alpinegizmo ! I updated the text accordingly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/4011 Thanks for the input @alpinegizmo ! I updated the text accordingly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/4011

          I'll merge the change.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/4011 I'll merge the change.
          Show
          rmetzger Robert Metzger added a comment - Resolved for master (1.4) in http://git-wip-us.apache.org/repos/asf/flink/commit/88545130 Resolved for 1.3 in http://git-wip-us.apache.org/repos/asf/flink/commit/dd1c05b1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/4011

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/4011

            People

            • Assignee:
              srichter Stefan Richter
              Reporter:
              srichter Stefan Richter
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development