Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33881

[TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull

    XMLWordPrintableJSON

Details

    Description

      In some scenarios, 'TtlListState#getUnexpiredOrNull -> elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.

      I found that for TtlListState#getUnexpiredOrNull, if none of the elements have expired, it still needs to copy all the elements and update the whole list/map in TtlIncrementalCleanup#runCleanup();

      I think we could optimize TtlListState#getUnexpiredOrNull by:
      1)find the first expired element index in the list;
      2)If not found, return to the original list;
      3)If found, then constrct the unexpire list (puts the previous elements into the list), and go through the subsequent elements, adding expired elements into the list.

      public List<TtlValue<T>> getUnexpiredOrNull(@Nonnull List<TtlValue<T>> ttlValues) {
          //.......
          int firstExpireIndex = -1;
          for (int i = 0; i < ttlValues.size(); i++) {
              if (TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
                  firstExpireIndex = i;
                  break;
              }
          }
          if (firstExpireIndex == -1) {
              return ttlValues;  //return the original ttlValues
          }
          List<TtlValue<T>> unexpired = new ArrayList<>(ttlValues.size());
          for (int i = 0; i < ttlValues.size(); i++) {
              if (i < firstExpireIndex) {
                  // unexpired.add(ttlValues.get(i));
                  unexpired.add(elementSerializer.copy(ttlValues.get(i)));
              }
              if (i > firstExpireIndex) {
                  if (!TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
                      // unexpired.add(ttlValues.get(i));
                      unexpired.add(elementSerializer.copy(ttlValues.get(i)));
                  }
              }
          }
          //  .....
      } 

      In this way, the extra iteration overhead is actually very very small, but the benefit when there are no expired elements is significant.

      Attachments

        1. image-2023-12-19-21-25-21-446.png
          1.16 MB
          Jinzhong Li
        2. image-2023-12-19-21-26-43-518.png
          1.77 MB
          Jinzhong Li

        Issue Links

          Activity

            People

              lijinzhong Jinzhong Li
              lijinzhong Jinzhong Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: