Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17928

Incorrect state size reported when using unaligned checkpoints

    XMLWordPrintableJSON

Details

    Description

      Even when checkpoints on HDFS are between 100-300MBs, the reported state size is in orders of magnitude larger with values like:

      1GiB  1.5TiB  2.0TiB  2.1TiB  2.1TiB
      148GiB  148GiB  148GiB  148GiB  148GiB  148GiB
      

      it's probably because we have multiple Collection<InputChannelStateHandle>, and each of the individual handle returns the same value from AbstractChannelStateHandle#getStateSize - the full size of the spilled data, ignoring that only small portion of those data belong to a single input channel/result subpartition. In other words {{
      org.apache.flink.runtime.state.AbstractChannelStateHandle#getStateSize}} should be taking the offsets into account and return only the size of the data that belong exclusively to this handle.

      Attachments

        Issue Links

          Activity

            People

              roman Roman Khachatryan
              pnowojski Piotr Nowojski
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: