Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13034

Improve the performance when checking whether mapstate is empty for RocksDBStateBackend

    XMLWordPrintableJSON

    Details

    • Release Note:
      We have added a new method MapState#isEmpty() which enables users to check whether a map state is empty. The new method is 40% faster than mapState.keys().iterator().hasNext() when using the RocksDB state backend.

      Description

      Currently, there existed several scenarios to check whether map state is empty in Flink source code, e.g.TemporalRowTimeJoinOperator, AbstractRowTimeUnboundedPrecedingOver.
      Developers would use below command to check whether the map state is empty:

      boolean noRecordsToProcess = !inputState.keys().iterator().hasNext();
      

      However, if we use RocksDBStateBackend, inputState.keys().iterator().hasNext() would actually call 1 seek and 128 next actions in RocksDBMapState, in which the redundant next actions are not what we want.

      I have two options to improve this:

      • Modify RocksDBMapState back to previous design which would first load one element and then load more elements in the follow-up queries. However, this would effect the performance of other map state methods.
      • Add a isEmpty() method in the public evolving interface MapState, so that we could use it to check whether the map state is empty without any redundant RocksDB actions.

      I prefer to the 2nd option.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yunta Yun Tang
                Reporter:
                yunta Yun Tang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m